Quantifying the Specificity of Near-duplicate Image Classification Functions
نویسندگان
چکیده
There are many published methods for detecting similar and near-duplicate images. Here, we consider their use in the context of unsupervised near-duplicate detection, where the task is to find a (relatively small) nearduplicate intersection of two large candidate sets. Such scenarios are of particular importance in forensic near-duplicate detection. The essential properties of a such a function are: performance, sensitivity, and specificity. We show that, as collection sizes increase, then specificity becomes the most important of these, as without very high specificity huge numbers of false positive matches will be identified. This makes even very fast, highly sensitive methods completely useless. Until now, to our knowledge, no attempt has been made to measure the specificity of near-duplicate finders, or even to compare them with each other. Recently, a benchmark set of near-duplicate images has been established which allows such assessment by giving a near-duplicate ground truth over a large general image collection. Using this we establish a methodology for calculating specificity. A number of the most likely candidate functions are compared with each other and accurate measurement of sensitivity vs. specificity are given. We believe these are the first such figures be to calculated for any such function.
منابع مشابه
Identification of MIR-Flickr Near-duplicate Images - A Benchmark Collection for Near-duplicate Detection
There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in ...
متن کاملHash Functions for Near Duplicate Image Retrieval
This paper proposes new hash functions for indexing local image descriptors. These functions are first applied and evaluated as a range neighbor algorithm. We show that it obtains similar results as several state of the art algorithms. In the context of near duplicate image retrieval, we integrated the proposed hash functions within a bag of words approach. Because most of the other methods use...
متن کاملDetection and Classification of Breast Cancer in Mammography Images Using Pattern Recognition Methods
Introduction: In this paper, a method is presented to classify the breast cancer masses according to new geometric features. Methods: After obtaining digital breast mammogram images from the digital database for screening mammography (DDSM), image preprocessing was performed. Then, by using image processing methods, an algorithm was developed for automatic extracting of masses from other norma...
متن کاملDetection and Classification of Breast Cancer in Mammography Images Using Pattern Recognition Methods
Introduction: In this paper, a method is presented to classify the breast cancer masses according to new geometric features. Methods: After obtaining digital breast mammogram images from the digital database for screening mammography (DDSM), image preprocessing was performed. Then, by using image processing methods, an algorithm was developed for automatic extracting of masses from other norma...
متن کاملNear Duplicate Image Identification with Spatially Aligned Pyramid Matching
A new framework, termed Spatially Aligned Pyramid Matching, is proposed for Near Duplicate Image Identification. The proposed method robustly handles spatial shifts as well as scale changes. Images are divided into both overlapped and non-overlapped blocks over multiple levels. In the first matching stage, pairwise distances between blocks from the examined image pair are computed using SIFT fe...
متن کامل